A controller for training a machine for automatizing lighting control actions and a method thereof

ABSTRACT

A method for training a machine for automatizing lighting control actions, wherein the method comprises the steps of: controlling one or more lighting devices based on a first set of control parameters; controlling the one or more lighting devices based on a second set of control parameters; wherein the second set of control parameters is different from the first set of control parameters; detecting presence of a user based on a presence signal output from a presence sensing means; monitoring a response of the user related to the second set of control parameters; wherein the response is monitored during a time period; evaluating feedback of the user based on the monitored response; wherein the feedback is positive if no active response has been monitored; training the machine based on the evaluated feedback.

FIELD OF THE INVENTION

The invention relates to a method for training a machine forautomatizing lighting control actions. The invention further relates toa controller and a lighting system for training a machine forautomatizing lighting control actions.

BACKGROUND

Connected lighting refers to a system of one or more lighting deviceswhich are controlled not by (or not only by) a traditional wired,electrical on-off or dimmer circuit, but rather by using a datacommunications protocol via a wired or more often wireless connection,e.g. a wired or a wireless network. These connected lighting networksform what is commonly known as the Internet of Things (IoT) or morespecifically the Internet of Lighting (IoL). Typically, the lightingdevices, or even individual lamps within a lighting device, may each beequipped with a wireless receiver or transceiver for receiving lightingcontrol commands from a lighting control device according to a wirelessnetworking protocol such as Zigbee, Wi-Fi or Bluetooth.

Generally, these (connected) lighting systems are pre-programmed withrecommended sets of lighting parameters, which are usually based onmanual rules and on sensor coupling. These parameters are selected forachieving a desired light effect on an ‘average’ person in an ‘average’environment. Machine learning algorithms can be used to optimize lighteffects for a user based on the user feedback. For finding an optimallight effect, such optimization algorithms require explicit feedbackfrom the user for each automatic action it takes. This approach howevercan be tedious for the user.

WO 2013/102881 discloses a method for determining user preference inlight management of an area. The method comprises the sets of collectingdata regarding artificial and natural light within the area, collectingdata regarding weather conditions and collating the collected data withregard to temporal and geographical information associated with thespace. The collected data is processed using stochastic processes toestimate the user's preferences at a specific time. Artificial lightingand natural lighting are adjusted based on the estimated userpreferences. User inputs may be provided to indicate the user'ssatisfaction and/or dissatisfaction with the estimate. User preferencesetting are updated and maintained when the user is satisfied with theresults.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide optimal lighteffects for a user in an environment. It is a further object to providean improved non-obtrusive feedback and learning mechanism based on userpreference with a minimal user interaction (feedback), such that theuser preference is obtained in a more natural way.

According to a first aspect, the object is achieved by a method fortraining a machine for automatizing lighting control actions, whereinthe method comprises the steps of: controlling one or more lightingdevices based on a first set of control parameters; controlling the oneor more lighting devices based on a second set of control parameters;wherein the second set of control parameters is different from the firstset of control parameters; detecting presence of a user based on apresence signal output from a presence sensing means; monitoring aresponse of the user related to the second set of control parameters;wherein the response is monitored during a time period; evaluatingfeedback of the user based on the monitored response; wherein thefeedback is positive if no active response has been monitored; trainingthe machine based on the evaluated feedback.

The method comprises controlling one or more lighting devices based on afirst and a second set of control parameters, respectively. The firstand the second set of control parameters are (perceivably) different.For instance, when a first and a second light effect are rendered basedon the first and the second set of control parameters, respectively, thedifference between the first light effect and the second light effect isperceivable by a user. For example, the second set of control parametersmay be selected with a magnitude substantially different from the firstset of control parameters resulting in a perceivable difference betweenthe first and the second light effects. In another example, the durationof rendering the first and the second light effects is selected suchthat the difference becomes perceivable by a user. In an example, thecontrolling of the one or more lighting devices based on the first orthe second set of control parameters may provide no light output fromthe one or more lighting devices, i.e. the one or more lighting devicesare powered off. In this example, the perceivably difference maycomprise from no light output to a light effect. The first and thesecond light effects may be rendered in an environment, such as anoffice, a factory, a house, etc.

The method further comprises detecting presence of a user based on apresence signal output from a presence sensing means. The presencedetection may be based on one or more of: image recognition, RF-basedpresence sensing, infrared detection, radar, acoustic sensing, etc. Thepresence detection of a user may be based on a communication devicesignal, such as mobile phone, computer, etc. assumed to be co-locatedwith the user. RF-based sensing and/or image recognition may be used ifthe user does not carry a communication device and does not transmit orreceive any signal. The presence detection may be performed in the sameenvironment where the one or more lighting devices have been controlledbased on the first and the second set of control parameters. Thepresence detection may be performed in the same environment where thefirst and the second light effects have been rendered. The methodfurther comprises monitoring a response of the user related to thesecond set of control parameters. The monitoring may be performed duringa time period. The monitoring may comprise observing the user reactionto the second set of control parameters. For monitoring the response,the presence of user is an important step. The time period may bepredetermined or may be chosen on the fly.

The method further comprises evaluating feedback of the user based onthe monitored response. The feedback is considered as positive if noactive response has been monitored while the user presence has beendetected. In an example, if the user is pleased with the rendered lighteffect, (s)he will keep the rendered light effect and will not ‘react’to it, e.g. by not changing it. The ‘inaction’ of the user related tothe second set of control parameters represents a positive feedback.Since the machine is trained based on the evaluated feedback whichcomprises ‘inaction’ as a positive feedback, while the user presence hasbeen detected, a more natural and improved non-obtrusive method isprovided to train a machine for optimizing light effect.

In an embodiment, controlling of the one or more lighting devices basedon the second set of control parameters may be triggered based on theuser presence detection.

In an example, when the presence of the user is detected thensubsequently the one or more lighting devices are controlled based onthe second set of control parameters. In this example, the user isadvantageously present when the second light effect is rendered.Therefore, (s)he may differentiate between the first and the secondlight effects and may provide his/her feedback on the second lighteffect. In an example, the one or more lighting devices are notproviding illumination (powered off), and then based on the userpresence; the one or more lighting devices are controlled to render asecond light effect.

In an embodiment, the time period may start upon detecting the userpresence and controlling the one or more lighting devices based on thesecond set of control parameters. In an embodiment the time period maybe ceased when the presence is no longer detected.

The time period to monitor the response of the user may be based on theuser presence detection and on controlling of the one or more lightingdevices based on the second set of control parameters. In an example,the time period may start when presence of the user is detected in theenvironment and when the second light effect is rendered in the sameenvironment. These both conditions, i.e. the user presence andcontrolling of the one or more lighting devices based on the second setof control parameters should be satisfied for starting of the timeperiod. To assign a positive feedback to user inaction, the user must bepresent. Furthermore, the feedback should be based on the renderedsecond light effect. Therefore, the time period is advantageouslystarted upon detecting the user presence and controlling the one or morelighting devices based on the second set of control parameters. In anexample, the time period may be ceased when the presence is no longerdetected. For instance, when the user leaves the environment, the‘inaction’ does not anymore represent the user preference. The timeperiod may be ceased when the controlling the one or more lightingdevices to render the second light effect has been ceased.

In an embodiment, the method may further comprise detecting an activityof the user; and the time period may be based on the detected activitysuch that the time period may be ceased when the activity is no longerdetected.

When the user is not active, e.g. (s)he is sleeping, said inactivitydoes not represent the user preference related to the rendered secondlight effect. Therefore, an activity of the user may be detected, andthe time period may be based on the detected activity. The time periodmay be ceased with the activity is no longer detected. It is understoodthat the detected activity of the user, for instance, if (s)he isplaying a video game, sleeping, etc., is different from inaction of theuser related to the second set of control parameters.

In an embodiment, the method may further comprise determining anidentity of the user; and determining the second set of controlparameters based on the determined identity.

In a multiuser environment, i.e. when multiple users are present in theenvironment, it is important to identify the user and train the machineaccording to the preference of the identified user. In this example, thesecond set of control parameters may be based on the identified user.

In an embodiment, the method may further comprise determining the secondset of control parameters based on a prior evaluated feedback.

The training of the machine may be an iterative process, e.g. the secondset of control parameters may be based on a prior evaluated feedback.For example, if the user prefers a light effect and has indicated it viaa prior evaluated feedback, the second light set of control parametersmay be determined based on the prior light effect. For instance, if theuser has indicated that (s)he prefers a high brightness level; thesecond set of control parameters may be determined such that thebrightness level is in a high range. The training of the machine mayiterate based on each evaluated feedback.

In an embodiment, the method may further comprise: receiving a signalindicative of a field of view of the user; determining one or morelighting devices with illumination in the field of view of the user;controlling the determined one or more lighting devices based on thesecond set of control parameters in the field of view of the user.

A field of view is an open observable area a user can see through his orher eyes or via an optical device. The one or more lighting devices maybe located in the field of view of the user or at least haveillumination in the field of view. Controlling of the determined one ormore lighting devices based on the second set of control parameters isadvantageously performed in the field of view of the user such that theuser can observe the second light effect. If the rendered second lighteffect is not in the field of view of the user, the inaction of user maynot represent the user preference related to the second set of controlparameters.

In an embodiment, an active response may comprise controlling the one ormore lighting devices based on a third set of control parameters basedon a received user input; and wherein the feedback is negative if saidactive response is monitored.

An active response of the user related to the second set of controlparameters may be to modify the second set of control parameters toproduce a third set of control parameters or to select a third set ofcontrol parameters being independent of the second set of controlparameters and control the one or more lighting devices thereon. Thecontrolling of the one or more lighting devices based on a third set ofcontrol parameters may be based on a received user input, e.g. via alegacy wall switch, voice command etc. In an example, the third set ofcontrol parameters is the first set of control parameters. In alternateexample, the third set of control parameters is different from the firstset of control parameters. If the user does not like the second lighteffects, (s)he may control the one or more lighting devices to eitherrender a different light effect or revert to the first light effect.Such monitored active response is considered as a negative feedback.

In an embodiment, an active response may comprise actuating at least oneactuator, by the user, related to the rendered second light effect.

One of the other active responses related to the second set of controlparameters may comprise actuating at least one actuator, e.g. a like ordislike button. For example, if the user actuates the like button, it isconsidered as a positive feedback, and if the user actuates the dislikebutton, it is considered as a negative feedback. The training of themachine is performed based on the evaluated feedback. The evaluation ofthe feedback (positive or negative) may be based on the context of theactive response.

In an embodiment, the one or more lighting devices may be controlledbased on the second set of control parameters prior to the user presencedetection.

Alternative to detecting user presence and subsequently controlling theone or more lighting devices to render the second light effect, the oneor more lighting devices may be controlled to render the second lighteffect prior to the user presence detection. To optimize the lighteffect, the user does not need to observe the change from the firstlight effect to the second light effect, only observing the second lighteffect and providing feedback may be sufficient for optimization.

In an embodiment, the machine may be trained using reinforcementlearning, and wherein the positive feedback may be a positive reward andthe negative feedback may be a negative reward.

Machine learning algorithms such as reinforcement learning may be usedto train the machine to optimize the light effect. Reinforcementlearning is a machine learning approach to optimize a policy, e.g. lightcontrol actions, by maximizing an ultimate reward through feedback. Thefeedback may be in the form of positive rewards and negative reward (orpunishments), after a sequence of actions, e.g. rendering of the secondlight effect. The machine may be trained based on a positive reward ifthe feedback is positive and on a negative award (punishment) if thefeedback is negative.

According to a second aspect, the object is achieved by a controller fortraining a machine for automatizing lighting control actions; whereinthe controller may comprise a processor arranged for executing the stepsof method according to the first aspect.

According to a third aspect, the object is achieved by a lighting systemfor training a machine for automatizing lighting control actionscomprising a plurality of lighting devices arranged for illuminating anenvironment; a controller according to the second aspect.

According to a fourth aspect, the object is achieved by a computerprogram product comprising instructions which, when the program isexecuted by a computer, cause the computer to carry out the steps of themethod according to the first aspect.

It should be understood that the computer program product and the systemmay have similar and/or identical embodiments and advantages as theabove-mentioned method.

BRIEF DESCRIPTION OF THE DRAWINGS

The above, as well as additional objects, features and advantages of thedisclosed systems, devices and methods will be better understood throughthe following illustrative and non-limiting detailed description ofembodiments of systems, devices and methods, with reference to theappended drawings, in which:

FIG. 1 shows schematically and exemplary an embodiment of a system fortraining a machine for automatizing lighting control actions;

FIG. 2 shows schematically and exemplary an embodiment of a controllerfor training a machine for automatizing lighting control actions;

FIG. 3 shows schematically and exemplary a flowchart illustrating anembodiment of a method for training a machine for automatizing lightingcontrol actions;

FIG. 4 shows schematically and exemplary a machine learning approach fortraining a machine for automatizing lighting control actions.

All the figures are schematic, not necessarily to scale, and generallyonly show parts which are necessary in order to elucidate the invention,wherein other parts may be omitted or merely suggested.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1 shows schematically and exemplary an embodiment of a system 100with lighting device(s) 110 a-d for illuminating an environment 101. Theenvironment 101 may be an indoor or outdoor environment, such as office,a factory, a house, a grocery store or a hospital, a sports arena etc.The system 100 exemplary comprises four lighting devices 110 a-d. Thelighting devices 110 a-d may be comprised in a lighting system. Thelighting system may be a connected lighting system, e.g. Philips Hue,wherein the lighting devices 110 a-d may be connected to an externalnetwork, e.g. Internet. A lighting device 110 a-d is a device orstructure arranged to emit light suitable for illuminating anenvironment 101, providing or substantially contributing to theillumination on a scale adequate for that purpose. A lighting device 110a-d comprises at least one light source or lamp (not shown), such as anLED-based lamp, gas-discharge lamp or filament bulb, etc., (optionally)with an associated support, casing or other such housing. Each of thelighting devices 110 a-d may take any of a variety of forms, e.g. aceiling mounted lighting device, a wall-mounted lighting device, a wallwasher, or a free-standing lighting device (and the lighting devicesneed not necessarily all be of the same type). In this exemplary figure,the lighting devices 110 a-c are ceiling mounted and the lighting device110 d is a free-standing lighting device. The system 100 may contain anynumber/type of the lighting devices 110 a-d.

The lighting devices 110 a-d may be controlled based on a first set ofcontrol parameters. The controlling of the lighting devices 110 a-d maycomprise controlling one or more of: color, color temperature,intensity, beam width, beam direction, illumination intensity, otherparameters of one or more of the light sources (not shown) of thelighting devices 110 a-d. The lighting devices 110 a-d may be controlledbased on a second set of control parameters. A first and a second lighteffect may be rendered when the lighting devices 110 a-d are controlledbased on the first and the second set of control parametersrespectively. The second set of control parameters may be different fromthe first set of control parameters such that the difference between thefirst light effect and the second light effect is perceivable by a user120. In a simple example, the light effect is a brightness level of thelighting devices 110 a-d, for instance, the first light effect is a 30%brightness level, and the second light effect is a 70% brightness level.The second light effect, i.e. 70% brightness level, is determined suchthat the difference between the first light effect and the second lighteffect is perceivable by a user 120. For example, the selection of 70%brightness level is based on an ambient light level in the environment101 such that a difference of 50% in brightness levels is perceivable bya user 120. In another example, the controlling of the lighting devices110 a-d based on the first set of control parameters provides no lightoutput.

In an example, the light effect comprises light scenes which can be usedto enhance, e.g. entertainment experiences such as audio-visual media,set an ambience and/or a mood of a user 120. For instance, for PhilipsHue connected lighting system, the first light effect is an ‘enchantedforest’ light scene and the second light effect is Go-to-sleep lightscene. The first and/or the second light effect may comprise a staticlight scene. The first and/or the second light effect may comprise adynamic light scene, wherein the dynamic light scene comprises lighteffects which change with time. For the dynamic light scene, the firstand/or the second light effect may comprise a first light state and asecond light state. The first light state may comprise a first(predefined) pattern and the second light state may comprise a second(predefined) pattern. The pattern may comprise a duration, level ofdynamism of the light effects etc. The first and the second set lightstates may be related to a first and a second subset of the second setof control parameters respectively. In such an example, the training ofthe machine comprises automatizing the (first and/or second) subsets ofthe second set of control parameters.

The system 100 may further comprise a presence sensing means, which isexemplary a presence sensor 140 in the figure. The system 100 maycomprise any number of presence sensors. The presence sensing means maycomprise a single device 140 or may comprise a presence sensing systemcomprising one or more devices arranged for detecting user presence. Thesystem 100 may comprise sensors (not shown) of other modalities such aslight sensor for detecting ambient light levels, a temperature sensor, ahumidity sensor, a gas sensor such as a CO2 sensor, a particlemeasurement sensor, and/or an audio sensor. The presence sensor 140 maybe arranged for sensing a signal indicative of a presence of a user 120.The presence sensor 140 may be a passive infrared sensor, an activeultrasound sensor or an imaging sensor such as a camera. Any sensingmethod or presence sensor, known in the art to detect user presence, maybe used for detected presence of a user 120. The user presence may bedetected based on presence signal output from a plurality of differentpresence sensors. The user 120 presence may be detected in theenvironment 101. The presence detection may be performed continuously,periodically or at random times. In an example, the second light effectmay be rendered subsequently upon detecting the presence of the user120. Alternatively, the second light effect may be rendered prior to theuser presence detection.

A signal indicative of the field of view of the user 120 may be receivedor the field of view may be determined based on an orientation signaloutput from an orientation sensor (not shown) which is able to detectthe orientation of the user 120. The field of view of the user 120 maybe determined based on a user position. In an example, the user positionmay comprise an absolute location of the user 120 in the environment101. In another example, the user position may comprise a relativeposition of the user 120 with respect to one or more lighting devicesand/or with respect to the first and the second rendered light effects.The position of the user 120 may be determined using the presencesensing means or by other means known in the art to detect position. Thelighting devices 110 a-d which have illumination in the field of view ofthe user 120 may be determined. The determined lighting devices 110 a-dmay be controlled to render the second light effect based on the secondset of control parameters in the field of view of the user 120. In anexample, a signal indicative of a trajectory of a (moving) user 120 maybe received or alternatively a trajectory of a (moving) user 120 may bedetermined, e.g. by using an imaging device such as a camera. Thelighting devices 110 a-d having illumination in the trajectory of theuser 120 may be determined. The determined lighting devices 110 a-d maybe controlled to render the second light effect based on the second setof control parameters in the trajectory of the (moving) user 120.

A response of the user 120 may be monitored related to the second set ofcontrol parameters, wherein the response is monitored during a timeperiod. The time period may start upon detecting the user presence andcontrolling the one or more lighting devices to render the second lighteffect. The time period may be ceased when the presence is no longerdetected. The sequence of these both conditions, i.e. user presencedetection and rendering of the second light effect, may be different,such that the user presence is detected first and the second lighteffect is rendered later or other way around. Alternatively, bothconditions may occur at the same time. The time period may bepredetermined and may be selected such that the time period is above athreshold value. In an example, if the time period is less than athreshold value, for instance, it starts upon detecting presence andceased when the presence is no longer detected, but the time period isless than a threshold value, the monitored response is discarded. Anexample of such situation may be when the user 120 is not present in theenvironment 101 for a long time, e.g. (s)he has come to pick upsomething and has left the environment 101. In these situations, thoughthe lighting devices 110 a-d may have been controlled to render thesecond light effect, but the user 120 might not have observed the secondlight effect. Therefore, the monitored response for such a small timeperiod is discarded. In another example, the time period may be based onan activity of the user 120. For example, an imaging sensor (e.g.camera) may be used to detect an activity of the user 120. The timeperiod is ceased when the activity is no longer detected, e.g. if theuser 120 is sleeping. The time period may be based on the rendered lighteffect, e.g. if the one or more lighting devices 110 a-d are turned off,i.e. controlled to not provide illumination to the environment 101, themonitored response in such situations may be discarded. The monitoredresponse, for the dynamic light scene, may be related to the first lightstate and/or to the second light state.

A feedback of the user 120 may be evaluated based on the monitoredresponse of the user 120. A positive feedback may be assigned if noactive response has been monitored. A positive feedback represents thatthe user 120, whose presence has been detected in the environment, ispleased with the second light effect and that is why (s)he does not wantto provide any active response related to the second set of controlparameters. For evaluating the feedback, e.g. for evaluating ‘inaction’as positive feedback, the user presence is a key element. An activeresponse may comprise controlling the lighting devices 110 a-d to rendera third light effect based on a third set of control parameters based ona received user input; and wherein the feedback may be negative if saidactive response is monitored. The third set of control parameters may bethe first set of control parameters, which indicates that the user 120reverts the second light effect back to the first light effect because(s)he does not like the second light effect. Therefore, a negativefeedback may be assigned to such active feedback. For the dynamic lightscene, the active response may comprise changing the second light state,which may indicate that the user is not pleased with the second lightstate. The system may further comprise a wall-switch 130 which may bearranged for controlling the lighting devices 110 a-d. For example, theuser 120 uses the wall-switch 130 to control the lighting devices 110a-d. Alternatively, the user 120 may use a voice command 133 or his/hermobile device 136 to control the lighting devices 110 a-d. In anexample, the active response may comprise actuating at least oneactuator (not shown), by the user 120, related to the rendered secondlight effect. The at least one actuator may be a like/dislike button orundo button, e.g. on the user's mobile device 136 to indicate his/herpreference. The at least one actuator may be used to control thelighting devices 110 a-d.

A machine may be trained based on the evaluated feedback. Machinelearning algorithms, such as supervised learning, e.g. SVM, decisionforest etc. may be used to train the machine. Reinforcement learning, asfurther discussed in FIG. 4 , may be used to train the machine. Forreinforcement learning, the positive feedback may be a positive rewardand the negative feedback may be a negative reward (or punishment). Thelearning algorithm may comprise iterative learning such that thedetermination of the second set of control parameters may be based on aprior evaluated feedback, wherein the algorithm iterative trains themachine. The training may comprise different phases, such as a feedbackphase wherein in the feedback phase the feedback is evaluated based onthe monitored response of the user 120. The length of the feedback phasemay comprise the time period, which is assumed to be long enough tocapture sufficient information needed for training. Subsequent to thefeedback phase, a training phase may be started. The training phase maybe defined during which the machine is trained. For iterative learning,the feedback phase and the training phase may be iteratively used. In anexample, the feedback phase may be the first week when the lightingdevices 110 a-b have been, e.g. initially installed, commissioned andthe user 120 has started using them. The duration of the feedback phasemay be defined by the user 120. In an example, in the training phase maycomprise a learning phase and a fine-tuning phase; wherein in thelearning phase, the second set of control parameters may be learnt basedon a user feedback. In the fine-tuning phase, the second set of controlparameters may be further optimized based on further user inputs. In anexample, an identity of the user 120 is determined, for instance, by theimaging sensor. The determination of the second set of controlparameters may be based on the determined identity, e.g. based on thepreference of the identified user.

FIG. 2 shows schematically and exemplary an embodiment of a controller210 for training a machine for automatizing lighting control actions.The controller 210 may comprise an input unit 214 and an output unit215. The input 214 and the output 215 units may be comprised in atransceiver (not shown) arranged for receiving (input unit 214) andtransmitting (output unit 215) communication signals. The communicationsignal may comprise control instructions to control the lighting devices110 a-d. The input unit 214 may be arranged for receiving communicationsignals from the switch 130 and/or from the voice command 133. The inputunit 214 may be arranged for receiving the communication signals fromthe user mobile device 136. The communication signals may comprisecontrol signals. The controller 210 may further comprise a memory 212which may be arranged for storing communication IDs of the lightingdevices 110 a-d and/or the sensor 140 etc. The controller 210 maycomprise a processor 213 arranged for training the machine.

The controller 210 may be implemented in a unit separate from thelighting devices 110 a-d/sensor 140, such as wall panel, desktopcomputer terminal, or even a portable terminal such as a laptop, tabletor smartphone. Alternatively, the controller 210 may be incorporatedinto the same unit as the sensor 140 and/or the same unit as one of thelighting devices 110 a-d. Further, the controller 210 may be implementedin the environment 101 or remote from the environment (e.g. on aserver); and the controller 210 may be implemented in a single unit orin the form of distributed functionality distributed amongst multipleseparate units (e.g. a distributed server comprising multiple serverunits at one or more geographical sites, or a distributed controlfunction distributed amongst the lighting devices 110 a-d or amongst thelighting devices 110 a-d and the sensor 140). Furthermore, thecontroller 210 may be implemented in the form of software stored on amemory (comprising one or more memory devices) and arranged forexecution on a processor (comprising one or more processing units), orthe controller 210 may be implemented in the form of dedicated hardwarecircuitry, or configurable or reconfigurable circuitry such as a PGA orFPGA, or any combination of these.

Regarding the various communication involved in implementing thefunctionality discussed above, to enable the controller 210, forexample, to receive presence signal output from the presence sensor 140and to control the light output of the lighting devices 110 a-d, thesemay be implemented in by any suitable wired and/or wireless means, e.g.by means of a wired network such as an Ethernet network, a DMX networkor the Internet; or a wireless network such as a local (short range) RFnetwork, e.g. a Wi-Fi, ZigBee or Bluetooth network; or any combinationof these and/or other means.

FIG. 3 shows schematically and exemplary a flowchart illustrating anembodiment of a method 300 for training a machine for automatizinglighting control actions. The method 300 may comprise controlling 310one or more lighting devices 110 a-d based on a first set of controlparameters, e.g. to render a first light effect. The one or morelighting devices 110 a-d may be further controlled 320 based on a secondset of control parameters, e.g. to render a second light effect. Thelight effect(s) may comprise a level of luminance, chrominance,saturation, color-balance, and/or a light scene etc. The control 310-320of the lighting devices 110 a-d may be in a field of view of a user 120.The control 310-320 of the lighting devices 110 a-d may be in atrajectory of the user 120. The control of the lighting devices 110 a-dmay be relative to a position of the user 120. The position of the user120 may be determined using the presence sensing means. The second setof control parameters may be different from the first set of controlparameters, for instance, in a way that the difference between the firstlight effect and the second light effect is perceivable by a user 120. Auser 120 presence may be detected 330 based on a presence signal outputfrom a presence sensing means. The presence sensing means may compriseimage sensing such as a camera, RF-based presence sensing etc. Themethod 300 may further comprise monitoring 340 a response of the user120 related to the second set of control parameters; wherein theresponse may be monitored during a time period. In an example, the timeperiod may be based on the controlling 320 of the lighting devices 110a-d to render the second light effect and on detection 330 of userpresence. In another example, an activity of the user 120 may bedetermined and the time period may be based on the determined activityof the user 120.

The method 300 may further comprise evaluating 350 feedback of the user120 based on the monitored response. The feedback may be positive if noactive response has been monitored. An active response may comprisecontrolling the one or more lighting devices 110 a-d to render a thirdlight effect based on a third set of control parameters based on areceived user input; and wherein the feedback may be negative if saidactive response is monitored.

The method may further comprise training 360 the machine based on theevaluated feedback. Machine learning algorithms may be used to train themachine. For example, supervised learning may be used. Supervisedlearning is the machine learning task of learning a function or modelthat maps an input to an output based on an input-output data pairs. Itinfers a function from a labeled training data set comprising of a setof training data. In supervised learning, each sample in the trainingdata set is a pair consisting of an input (e.g. a vector) and a desiredoutput value. For instance, the evaluated feedback is output, and thesecond set of control parameters is the input vector. The training dataset comprises the output (feedback) and the input (the second set ofcontrol parameters). A supervised learning algorithm, such as Supportvector machine (SVM), decision tree (random forest) etc., analyzes thetraining data set and produces an inferred function or model, which canbe used for making predictions based on a new data set. In this example,a binary classifier machine may be trained, which may predict the user120 preference for a new set of control parameters. If the modelpredicts a positive user preference for the new set of controlparameters, the lighting devices 110 a-d may be controlled to render anew light effect based on the new set of control parameters. Alternativeto supervised learning, reinforcement learning may be used to train themachine as further discussed in FIG. 4 . Other learning algorithms suchas rule-based learning, probabilistic reasoning, fuzzy logic to train amachine for automatizing lighting control action known in the art mayalso be considered.

The method 300 may be executed by computer program code of a computerprogram product when the computer program product is run on a processingunit of a computing device, such as the processor 213 of the controller210.

FIG. 4 shows schematically and exemplary a machine learning approach,i.e. Reinforcement learning, for training a machine for automatizinglighting control actions. Reinforcement learning is a machine learningapproach to optimize a policy, e.g. lighting control actions, bymaximizing an ultimate reward r_(t) through feedback. The ultimatereward r_(t) is in the form of rewards and punishments for a sequence ofactions a_(t), e.g. rendering of a light effect. In reinforcementlearning, the machine perceives its environment's state s_(t) as avector of features. The machine can execute actions in every state s_(t)and based on action a_(t) receives either a reward or a punishmentr_(t+1) and moves to another state s_(t+1). The goal of thereinforcement learning is to learn a policy, that is the prescription ofthe optimal action at to execute in each state s_(t). The action isoptimal if it maximizes the average reward.

In an example, the agent 410 is the lighting system comprising one ormore lighting devices 110 a-d. The agent 410 may be arranged for takingactions a_(t), e.g. controlling the one or more lighting devices 110 a-dto render the first and/or the second light effects. The environment 420may be an office, a garden, a factory, a house, a grocery store or ahospital. The action a_(t) of the agent 410 affects the environment 420,i.e. the controlling of the one or more lighting devices 110 a-d changesthe illumination in the environment.

A presence of a user 120 may be detected and a response of the user 120related to the second set of control parameters may be monitored. Theresponse may be monitored during a time period. A feedback may evaluatedwherein a positive feedback may be assigned if no active response hasbeen monitored. An active response may comprise controlling the one ormore lighting devices 110 a-d to render a third light effect based on athird set of control parameters based on a received user input; andwherein the feedback is negative if said active response is monitored.The positive feedback may indicate that the user is pleased with thesecond light effect, whereas the negative feedback may indicate that theuser is not pleased and (s)he prefers to change the second light effectto a third light effect. The ultimate reward r_(t) may comprise rewardand punishment corresponding to, in this example, the positive feedbackand the negative feedback respectively.

The state s_(t) is a light effect, e.g. the first light effect, thesecond light effect etc. With every action a_(t) of the agent 410, theuser 120 may be monitored and the feedback is evaluated as reward andpunishment. Based on the reward and punishment, reinforcement learningoptimizes the light effect, i.e. action of the agent 410 for the user120 in the environment 420. Different reinforcement learning algorithmssuch as Q-Learning, State-Action-Reward-State-Action (SARSA), Deep QNetwork (DQN), Deep Deterministic Policy Gradient (DDPG) etc. may beused for training the machine for automatizing lighting control actions.

It should be noted that the above-mentioned embodiments illustraterather than limit the invention, and that those skilled in the art willbe able to design many alternative embodiments without departing fromthe scope of the appended claims.

In the claims, any reference signs placed between parentheses shall notbe construed as limiting the claim. Use of the verb “comprise” and itsconjugations does not exclude the presence of elements or steps otherthan those stated in a claim. The article “a” or “an” preceding anelement does not exclude the presence of a plurality of such elements.The invention may be implemented by means of hardware comprising severaldistinct elements, and by means of a suitably programmed computer orprocessing unit. In the device claim enumerating several means, severalof these means may be embodied by one and the same item of hardware. Themere fact that certain measures are recited in mutually differentdependent claims does not indicate that a combination of these measurescannot be used to advantage.

Aspects of the invention may be implemented in a computer programproduct, which may be a collection of computer program instructionsstored on a computer readable storage device which may be executed by acomputer. The instructions of the present invention may be in anyinterpretable or executable code mechanism, including but not limited toscripts, interpretable programs, dynamic link libraries (DLLs) or Javaclasses. The instructions can be provided as complete executableprograms, partial executable programs, as modifications to existingprograms (e.g. updates) or extensions for existing programs (e.g.plugins). Moreover, parts of the processing of the present invention maybe distributed over multiple computers or processors or even the‘cloud’.

Storage media suitable for storing computer program instructions includeall forms of nonvolatile memory, including but not limited to EPROM,EEPROM and flash memory devices, magnetic disks such as the internal andexternal hard disk drives, removable disks and CD-ROM disks. Thecomputer program product may be distributed on such a storage medium, ormay be offered for download through HTTP, FTP, email or through a serverconnected to a network such as the Internet.

1. A method for training a machine for automatizing lighting controlactions, wherein the method comprises the steps of: controlling one ormore lighting devices based on a first set of control parameters;controlling the one or more lighting devices based on a second set ofcontrol parameters; wherein the second set of control parameters isdifferent from the first set of control parameters; detecting presenceof a user based on a presence signal output from a presence sensingmeans; monitoring a response of the user related to the second set ofcontrol parameters; wherein monitoring comprises observing the userreaction to the second set of control parameters; wherein the responseis monitored during a time period while the user presence has beendetected; wherein the time period is ceased when the presence is nolonger detected; evaluating feedback of the user based on the monitoredresponse; wherein the feedback is positive if no active response hasbeen monitored; wherein an active response comprises controlling the oneor more lighting devices based on a third set of control parametersbased on a received user input; and wherein the feedback is negative ifsaid active response is monitored; and training the machine based on theevaluated feedback.
 2. The method according to claim 1, whereincontrolling the one or more lighting devices based on the second set ofcontrol parameters is triggered based on the user presence detection. 3.The method according to claim 1, wherein the time period starts upondetecting the user presence and controlling the one or more lightingdevices based on the second set of control parameters.
 4. The methodaccording to claim 1, wherein the method further comprises detecting anactivity of the user; and wherein the time period is based on thedetected activity such that the time period is ceased when the activityis no longer detected.
 5. The method according to claim 1, wherein themethod further comprises: determining an identity of the user; anddetermining the second set of control parameters based on the determinedidentity.
 6. The method according to claim 1, wherein the method furthercomprises: determining the second set of control parameters based on aprior evaluated feedback.
 7. The method according to claim 1, whereinthe method further comprises: receiving a signal indicative of a fieldof view of the user; determining one or more lighting devices withillumination in the field of view of the user; controlling thedetermined one or more lighting devices based on the second set ofcontrol parameters in the field of view of the user.
 8. The methodaccording to claim 1, wherein the third set of control parameterscomprises the first set of control parameters.
 9. The method accordingto claim 1, wherein an active response comprises actuating at least oneactuator, by the user, related to the second set of control parameters.10. The method according to claim 1, wherein the one or more lightingdevices are controlled based on the second set of control parametersprior to the user presence detection.
 11. The method according to claim1, wherein the machine is trained using reinforcement learning, andwherein the positive feedback is a positive reward and the negativefeedback is a negative reward.
 12. A controller for training a machinefor automatizing lighting control actions; wherein the controllercomprises a processor arranged for executing the steps of methodaccording to claim
 1. 13. A lighting system for training a machine forautomatizing lighting control actions comprising a plurality of lightingdevices arranged for illuminating an environment; a controller accordingto claim
 12. 14. A non-transitory computer program product comprisinginstructions which, when the program is executed by a computer, causethe computer to carry out the steps of the method of claim 1.